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Can complex engineered and biological networks be coarse-grained into smaller and more under- 
standable versions in which each node represents an entire pattern in the original network? To 
address this, we define coarse- graining units (CGU) as connectivity patterns which can serve as the 
nodes of a coarse-grained network, and present algorithms to detect them. We use this approach to 
systematically reverse-engineer electronic circuits, forming understandable high-level maps from in- 
comprehensible transistor wiring: first, a coarse-grained version in which each node is a gate made of 
several transistors is established. Then, the coarse-grained network is itself coarse-grained, resulting 
in a high-level blueprint in which each node is a circuit-module made of multiple gates. We apply our 
approach also to a mammalian protein-signaling network, to find a simplified coarse-grained network 
with three main signaling channels that correspond to cross-interacting MAP-kinase cascades. We 
find that both biological and electronic networks are 'self-dissimilar', with different network motifs 
found at each level. The present approach can be used to simplify a wide variety of directed and 
nondirected, natural and designed networks. 



PACS numbers: 05, 89.75 
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I. INTRODUCTION 

In both engineering and biology it is of interest to un- 
derstand the design of complex networks 0, 0, , a task 
known as 'reverse-engineering'. In electronics, digital cir- 
cuits are top-down-engineered starting from functional 
blocks, which are implemented using logic gates, which 
in turn are implemented using transistors 4]. Reverse- 
engineering of an electronic circuit means starting with 
a transistor map and inferring the gate and block levels. 
Current approaches to reverse-engineering of electronic 
circuits usually require prior knowledge of the library of 
modules used for forward-engineering 0, . 
In biology, increasing amounts of interaction networks are 
being experimentally characterized, yet there are few ap- 
proaches to simplify them into understandable blueprints 
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Here we present an approach for simplifying networks by 
creating coarse-grained networks in which each node is a 
pattern in the original network. This approach is based 
on network motifs, significant patterns of connections 

that recur throughout the network m mm 113. we 

define coarse-graining units, CGUs, which can be used 
as nodes in a coarse-grained version of the network. We 
demonstrate this approach by coarse-graining an elec- 
tronic and a biological network. 

Definition of CGUs: CGUs are patterns which can 
optimally serve as nodes in a coarse-grained network. 
One can think of CGUs as elementary circuit compo- 
nents with defined input and output ports, and internal 
computational nodes. The set of CGUs comprise a dic- 
tionary of elements from which a coarse-grained version 
of the original network is built. 

The coarse-grained version of the network is a new net- 
work with fewer elements, in which some of the nodes 
are replaced by CGUs. Our approach to define CGUs 



is loosely analogous to coding principles and to dictio- 
nary text compression techniques |2f| |2|j . The goal is to 
choose a set of CGUs that (a) is as small as possible, (b) 
each of which is as simple as possible, and which (c) make 
the coarse-grained network as small as possible. These 
three properties can be termed 'conciseness', 'simplicity' 
and 'coverage'. Conciseness is defined by the number of 
total CGU types in the dictionary set. Coverage is the 
number of nodes and edges eliminated by coarse-graining 
the network using the CGUs. To define simplicity, we 
describe each occurrence of the subgraph, G, as a 'black 
box'. The black box has input ports and output ports, 
which represent the connections of G to the rest of the 
network, R (Fig 1). There can be four types of nodes in 
G : input nodes that receive only incoming edges from 
R, output nodes that have only outgoing edges to R, in- 
ternal nodes with no connection to R, and mixed nodes 
with both incoming and outgoing edges to R. To obtain 
a minimal loss of information, a coarse-grained version 
of G includes ports, which carry out the interface to the 
rest of the network. The number of ports in the black 
box representing G is: 



H = I + Q + 2M 



(1) 



where / is the number of input nodes, O the number of 
output nodes and M the number of mixed nodes (inter- 
nal nodes do not contribute ports and each mixed node 
contributes two ports). The lower the number of ports, 
H, the more 'simple' the CGU. 

After defining simplicity, coverage and conciseness, one 
can choose the optimal set of CGUs. To choose the opti- 
mal set of CGUs, we maximize a scoring function which 
is defined for a set of N CGUs: 



S = E covered + aAP - 0N 



N 
i=l 



(2) 
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-E 'covered is the number of edges covered by all occurrences 
of the CGUs, and therefore eliminated in the coarse- 
grained network. TV is the number of distinct CGUs, 
Tj is the number of internal nodes in the i-th CGU. AP 
is the difference between the number of nodes in the orig- 
inal network and the number of nodes and ports in the 
coarse-grained network: 

N 

AP — Pcovered ^ HiHi (3) 
i=l 

where P CO vered is the number of nodes covered by all oc- 
currences of the CGUs, rii is the number of occurrences 
in the network of CGU i, and Hi is the number of ports 
of CGU i. Using this we obtain: 

N N 

& [-^covered ~t~ ^-^covered] 

i=l i=l 

The scoring function has two terms: The first term, cor- 
responding to coverage, corresponds to the simplification 
gained by coarse-graining, while the second term, cor- 
responding to simplicity and conciseness, quantifies the 
complexity of the CGU dictionary. Maximizing S fa- 
vors use of a small set of CGUs, preferentially those that 
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FIG. 1: Black box representation of a subgraph and the 
classes of nodes and ports. The nodes of the subgraph (num- 
bered 1-5) are classified into input (I), output (O), internal 
(T) and mixed (M) nodes according to the edges that con- 
nect them to the rest of the network (dashed arrows). The 
subgraph is represented as a black box with input and out- 
put ports (right side of figure). The complexity-measure H 
is the total number of ports, a. Subgraph with no mixed 
nodes. The connectivity profile vector is (I,I,T,T,0) b. sub- 
graph with a mixed node. The connectivity profile vector is 
(I,I,T,M,0). 



appear often, with many internal nodes and few mixed 
nodes (since internal nodes do not contribute ports to 

and mixed nodes contribute two ports). 
The last term in the scoring function, which is the to- 
tal number of internal nodes in the dictionary, bounds 
the CGU size and prevents the trivial solution where 
the entire network is replaced by a single complex node. 
a,/3,7 are parameters that can be set for various degrees 
of coarse-graining (The results below are insensitive to 
varying these parameters over 3 orders of magnitude) . 
We use a simulated annealing approach j2?J to find the 
optimal set of CGUs for coarse-graining : There is po- 
tentially a huge number of subgraphs that can serve as 
candidate CGUs. To reduce the number of candidate 
subgraphs, and to focus on those likely to play func- 
tional roles, we consider only subgraphs that occur in 
the network significantly more often than in randomized 
networks: network motifs [H HJ El El . A candidate 
set of CGUs is obtained by first detecting all network 
motifs of 3 — 6 nodes (Appendix A). The nodes of ev- 
ery occurrence of each motif are classified to one of the 
4 types (input/output/internal/mixed). This defines a 
connectivity profile for each occurrence. For example, 
the two subgraphs in Fig 1 have the profiles (I,1,T,T,0) 
and (I,I,T,M,0), where 1,T,0,M represent input, inter- 
nal, output and mixed nodes respectively. 
The occurrences of each motif are then grouped together 
according to their profile to form a CGU candidate p59j . 
A CGU candidate of n nodes is thus characterized by its 
topology (an n * n adjacency matrix) and by a n-length 
profile vector of node classifications (Fig 3). 
In the simulated annealing optimization algorithm, each 
CGU candidate is assigned a random spin variable which 
is either 1 if all its occurrences participate in the coarse- 
graining or otherwise. CGU candidates with spin 1 
compose the "active set". At each step a spin is ran- 
domly chosen and flipped, and the coarse-graining score 
for the new active set is computed. The active set is up- 
dated according to a Metropolis Monte-Carlo procedure 

mm. 

Once an optimal set of CGUs is found a coarse-grained 
representation of the original network is formed by re- 
placing each occurrence of a CGU with a node (Appendix 
B). Generally the coarse-grained representation is a hy- 
brid in which some nodes represent CGUs, and other 
nodes are the original nodes. The algorithm can be re- 
peated on the coarse-grained representation, to obtain 
higher levels of coarse-graining. 

Note that the coarse-graining problem is quite dif- 
ferent from the well-studied circuit partitioning problem 
|29|. and from the detection of community structure in 
networks |3(J, |3j, |32] • These algorithms seek to divide 
networks into subgraphs with minimal interconnections, 
usually resulting in a set of distinct and rather complex 
subgraphs. In contrast, coarse-graining seeks a small dic- 
tionary of simple subgraph types in order to help under- 
stand the function of the network in terms of recurring 
independent building blocks. An analogy is the detection 
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FIG. 2: Transistor level map of an 8-bit binary counter, (IS- 
CAS89 circuit S208 H^J). Nodes are junctions between tran- 
sistors, and directed edges represent wire connections. High- 
lighted is a subgraph that represents the transistors that make 
up one NOT gate. 



of words in a text, from which spaces and punctuation 
marks have been removed, without prior knowledge of 
the language. 



II. COARSE-GRAINING OF AN ELECTRONIC 
CIRCUIT 

To demonstrate the coarse-graining approach we an- 
alyzed an electronic circuit derived from the ISCAS89 
benchmark circuit set |33l l34j . The circuit is a mod- 
ule used in a digital fractional multiplier (5*208 33]). 
The circuit is given as a netlist of 5 gate types (AND, 
OR, NAND, NOR, NOT) and a D-flipfiop(DFF). To syn- 
thesize a transistor level implementation of this circuit 
(Fig 2) we first replaced every DFF occurrence with a 
standard implementation using 4 NAND gates and one 
NOT gate |4J. All gates were then replaced with their 
standard transistor-transistor logic (TTL) implementa- 
tion |35j . where nodes represent junctions between tran- 
sistors (for this purpose resistors and diodes were ignored, 
as were ground and Vcc). The resulting transistor net- 
work (Fig 2) has 516 nodes and 686 edges. 
Four CGUs were detected in the transistor level, each 
with five or six nodes (Fig 3,4a). These patterns corre- 
spond to the transistor implementations of the five basic 
logic gates AND, NAND, NOR, OR and NOT (Fig 4a). 
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FIG. 3: A partial set of the network motif candidate CGUs 
for the transistor level network. The number of occurrences 
of each motif in the transistor network is shown. The optimal 
CGU dictionary consists of 4 units (solid boxes - CGU set 
1 a = 0.2, f3 — 20,7 — 0.01). A second optimal solution 
consisting of 2 units, which is found for high values of /? is 
also shown (dashed boxes - CGU set 2 a = 0.2, (3 = 500, 7 = 
0.01). Note that several CGU candidates share the same motif 
topology. They differ by their connectivity profile vectors 
(input / output /internal/mixed) 



FIG. 4: The CGUs found in the different coarse-grained levels 
of the electronic circuit. At the gate level the CGUs are the 
TTL implementation of AND, OR, NAND, NOR and NOT 
gates (NAND and NOT differ by the type of transistor at 
the input). At the flip-flop level, a single CGU, occurring 
8 times is found. This CGU corresponds to the 5-gate im- 
plementation of a D-flip-flop with an additional gate at the 
input. At the counter level, two CGU topologies are found: 
Seven occurrences of a 3-node feedback loop+mutual edge, 
and one occurrence of a 4-node feedback loop+mutual edge, 
representing CGU4. 
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FIG. 5: Four levels of representation of the electronic circuit. In the transistor level, nodes represent transistor junctions. In 
the gate level, nodes are CGUs made of transistors, each representing a logic gate. Shown is the CGU that corresponds to a 
NAND gate. In the flip-flop level, nodes are either gates or a CGU made of gates that corresponds to a D-type flip-flop with 
an additional logic gate at its input. In the counter level, each node is a gate or a CGU of gates/flip-flops that corresponds to 
a counter subunit. Numbers of nodes (P) and edges (E) at each level are shown. 



These CGUs were used to form a coarse-grained version 
of the network in which each node is a CGU. In this case 
coverage was complete, and all of the original nodes were 
included within CGUs. This network, termed 'gate-level 
network' had 99 nodes and 153 edges. 
We next iterated the coarse-graining process, by apply- 
ing the algorithm to the gate-level network. One CGU 
with six nodes (gates) was detected. This CGU corre- 
sponds to a D-flip-fiop with an additional logic gate (Fig 
4b). A 'flip-flop level' coarse-grained network was then 
formed with nodes which were either gates or flip-flops. 
This network had 59 nodes and 97 edges. 
We applied the coarse-graining algorithm again to the 
flip-flop level network. Two types of CGUs were found 
(Fig 4c), which correspond to units of a digital counter. 
Using these CGUs, we constructed the highest-level 
coarse-grained network in which each node is either a 
CGU or a gate. This network, depicted in Fig 5 top 
panel, had 42 nodes and 56 edges. Thus, the highest- 
level coarse-grained network has 12-fold fewer nodes and 
edges than the original transistor-level network. This 
high-level map corresponds to sequential connections of 
binary counter units, each of which halves the frequency 
of the binary stream obtained from the previous unit. 
This map thus describes an 8-bit counter 36]. 



In other electronic circuits, we find other CGUs, includ- 
ing a XOR built of 4 NAND gates H,[l3 (data not 
shown). The coarse-graining approach appears to au- 
tomatically detect favorite modules used by electronic 
engineers. 



III. COARSE-GRAINING OF BIOLOGICAL 
NETWORKS 

Recent studies have shown that biological networks 
contain significant network motifs [T^. I20L l2ll |2^ . 
Theoretical and experimental studies have demonstrated 
that each network mot if p erforms a key information 
processing function E3, Ea HE |3^, E3, 113, E3, E3| - 
A coarse-grained version of biological networks is of 
interest because it would provide a simplified representa- 
tion, focused on these important sub-circuits. However, 
whereas electronic circuits are composed of exact copies 
of library units, in biology the recurring black boxes 
may not be of precisely the same structure. In addi- 
tion, the characterization of signaling and regulatory 
networks is currently incomplete due to experimental 
limitations. Thus a more flexible definition of CGUs 
is needed [37|. To address these issues we modify our 
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algorithm by allowing each CGU to represent a family of 
subgraphs, which share a common architectural theme. 
Thus, the CGUs are probabilistically generalized network 
motifs (PGNM): network motifs of different sizes which 
approximately share a common connectivity pattern. 

Probabilistic generalization of network motifs: 

To define PGNMs, we must first discuss the concept of 
block-models ^4j-^(|. A block-model is a compact rep- 
resentation of a subgraph, ft consists of two elements : 
1) a partition of the subgraph nodes into discrete subsets, 
called roles [2^ • 2) a statement about the presence or ab- 
sence of a connection between roles (Fig 6). A subgraph 
of n nodes can be described by an adjacency matrix G, 
where Gy = 1 if a directed edge exists from node i to 
node j, and Gy = 0, if there is no connection. A block- 
model partitions the n nodes into m < n roles accord- 
ing to structural equivalence. Two nodes are structurally 
equivalent if they share exactly the same connections to 
all other nodes. The block model is an m * m matrix A, 
where Ajj = 1 means that all nodes which share role / 
have a directed connection to all nodes which share role 
J (Fig 6). 

In large subgraphs of real-world networks, perfect struc- 
tural equivalence is not always seen. A block-model can 
still be used as an idealized structure which can be com- 
pared to a given subgraph. The distance between a sub- 
graph and a proposed block-model, can be defined as 



Sw 
St 



where Sw is the within-block sum of squares 
5 ^ = EE E (Gij-iGu)) 2 



E 

ieije.J 



(5) 



(0) 



{Gij) is the mean of the adjacency matrix values in block 
{J, J}, and St is the total sum of squares : 

5 T = ^(G y -<G)) 2 



where (G) = J2Gij/n 2 is the mean value of G|6 
subgraph with d — is perfectly described by its block 
model. For example, subgraph Gl in Fig 6 has n = 7 
nodes. It can be described by a block- model with m = 2 
roles. Nodes 1—4 are assigned the first role and nodes 
5 — 7 are assigned the second role. The distance be- 
tween the subgraph and the proposed block-model is 
d = 0.1075. Fig 6 also shows a subgraph, G2, which 
is far from the proposed block model (d = 0.7538). 
Finding the best block-model to fit arbitrary connectivity 
data is a combinatorially complex problem [idl l45l l46j . 
requiring exhaustive testing of different assignments of 
nodes to roles. However, an efficient algorithm to detect 
PGNMs can be formed based on the fact that small net- 
work motifs in biological networks aggregate to form net- 
work motif topological generalizations |22t l47j . Topolog- 
ical generalizations are subgraphs obtained from smaller 



network motifs, by replicating one or more of their roles, 
together with its connections (Fig 7). An algorithm 
to detect PGNMs is described in Appendix C. 

To determine the optimal dictionary of CGUs, includ- 
ing the PGNMs, we use the following modified version of 
the scoring function of Eq 2 : 



E r 



aAP - (5N 



N 



N 



E d <( 8 ) 

ie{CGU g } 



N, the number of CGUs, is the number of basic motifs 
used. CGUg includes the set of all PGNMs based on the 
CGUs. Each CGU can give rise to several PGNMs of 
different sizes 621. 
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FIG. 6: A block-model (top) and two subgraphs, one which 
fits the block-model (Gl, bottom left) and one which does 
not (G2, bottom right). Gl has 7 nodes and 2 roles (nodes 
1 — 4 share role 1 and nodes 5 — 7 share role 2). Its adjacency 
matrix is shown below, with lines indicating the block-model 
partition. An edge between node 3 and node 6 is missing 
for a perfect fit to the proposed block-model. The distance 
between the block matrix and the adjacency matrix is d = 
0.1075. The right subgraph, G2 does not fit the proposed 
block-model A. The distance between the block matrix and 
the adjacency matrix is d — 0.7538. An alternative block- 
model with 3 roles - ({1, 2}, {3, 4}, {5, 6, 7}) would perfectly 
fit this subgraph, with d = . Both of these subgraphs are 
aggregates of a 4- node bifan subgraph (Fig 7). 
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IV. CGUS IN A PROTEIN-SIGNALING 
NETWORK 

We analyzed a database of mammalian signal trans- 
duction pathways [H, E^, US, HH E3, EM IH3, EM |5^| 



rolel 



role2 




FIG. 7: Topological generalizations of the bifan subgraph 
and their adjacency matrices. The bifan subgraph has two 
roles - nodes 1,2 share role 1 and nodes 3,4 share role 2. 
Lines indicate the block-model partition. Below are two gen- 
eralized subgraphs obtained by role replication |2^| . Subgraph 
Gl (left) is obtained by replicating the first role, with its con- 
nections. Subgraph G2 (right) is obtained by replicating the 
second role, with its connections. Adjacency matrices and 
block-model partitions are shown. The role-replication op- 
eration extends a subgraph while keeping a perfect fit to its 
block-model. 




FIG. 8: A network of signal-transduction pathways in mam- 
malian cells. 



based on the Signal Transduction Knowledge Envi- 
ronment [5(j. This dataset contains 94 proteins and 
209 directed interactions (Fig 8). The optimal coarse 
graining is based on a single motif - the 4-node bifan 
(Fig 9). Thus N = 1. We find 9 occurrences of PGNMs 
based on the bifan, labelled CGU0-CGU8 which share 
a common design consisting of a row of input nodes 
with overlapping interactions to a row of output nodes 
(Fig 9). The input and output rows in these CGUs 
sometimes represent proteins from the same sub-family 
(eg. JNK1,JNK2 and JNK3 in CGU 3), and in other 
cases they represent proteins from different sub-families 
(ERK and p38 in CGU 6). This type of structure allows 
hard-wired combinatorial activation and inhibition of 
outputs. Similar structures were described in transcrip- 
tion regulation networks ('dense overlapping regulons' 

E3). 

Using this CGU, the signaling network can be coarse- 
grained (Fig 10a), showing three major signaling 
channels (Fig 10b). These channels correspond to 
the well-studied ERK, JNK and p38 MAP-kinase 
cascades, which respond to stress signals and growth 

factors El, EM E3, Eil E2, El, m, m . 

Each channel is made of three CGUs in a cascade. 
In each cascade, the top and bottom CGUs contain 
only positive (kinase) interactions, and the middle 
CGU contains both positive and negative (phosphatase) 
interactions. The p38 and ERK channels intersect at 
CGU 6. The MAPK phosphatase 2 (MKP2) participates 
in both the JNK pathway (CGU2) and the ERK path- 
way (CGU8), whereas MAPK phosphatase 5 (MKP5) 
participates in both JNK pathway (CGU2) and the P38 
pathway (CGU5). The MAPKKK ASK1 and TAK1 
are shared by both the JNK pathway (CGU1) and P38 
pathway (CGU4) E3, E3| 



V. SELF-DISSIMILARITY OF NETWORK 
STRUCTURE 

Interestingly, the coarse-grained signaling network 
displays a different set of network motifs than the 
original network, with prominent cascades (Fig 10c). 
Similarly, the electronic network displayed different 
CGUs at each level (Fig 4). These networks are there- 
fore self- dissimilar [571 158|: the structure at each level 
of resolution is different. 



VI. DISCUSSION 

We presented an approach for coarse-graining networks 
in which a complex network can be represented by a 
compact and more understandable version. Performing 
an optimization on the space of network motifs of dif- 
ferent sizes, we found optimal units for coarse-graining, 
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FIG. 9: CGUs in the signal-transduction network. One CGU is found, the 4-node bifan with 9 PGNMs, numbered CGUO- 
CGU8. Solid arrows represent positive (kinase) interactions, dashed arrows represent negative interactions (phosphatase). 
Empty circles represent duplicated nodes (nodes which participate in more than one PGNM). K, K 2 , K 3 and K 4 represent 
MAP-kinase, kinase-kinase, kinase-kinase-kinase etc. 




FIG. 10: a. Coarse-grained version of the signal-transduction network. Three signaling channels made of cascades of the CGU 
occurrences are highlighted. Solid arrows represent positive (kinase) interactions, dashed arrows represent negative interactions 
(phosphatase). EGFR and PKA have been drawn more than once for clarity, b. The three signaling channels, c. The network 
motifs j2$j| found at the two levels. 
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CGUs, which allow a maximal reduction of the network, 
while keeping a concise and simple dictionary of elements. 
We demonstrated that this method can be used to fully 
reverse-engineer electronic circuits, from the transistor 
level to the highest module level, without prior knowl- 
edge of the library components used to create them. 
For biological networks, where modularity may be less 
stringent than electronic circuits, we modified the algo- 
rithm to seek a coarse-grained network, using a small 
set of structures of different sizes that form probabilis- 
tically generalized network motifs. Using this approach, 
a coarse-grained version of a mammalian signaling net- 
work was established, using one CGU composed of cross- 
activating MAP-kinases. In the coarse-grained network 
one can easily visualize intersecting signalling pathways 
and feedback loops. The present approach allows a sim- 
plified coarse-grained view of this important signaling 
network, showing the major signaling channels, and spec- 
ifies the recurring circuit element (CGU) that may char- 
acterize protein signaling pathways in other cellular sys- 
tems and organisms. 

Biological and electronic networks are both self- 
dissimilar |57l IHsj , showing different network motifs on 
different levels. This contrasts with views based on 
statistical-mechanics near phase-transition points which 
emphasize self-similarity of complex systems. 
It is important to stress that not every network can be 
effectively coarse-grained, only networks with particular 
modularity and topology. The method can readily be 
applied to nondirected networks. It would be interesting 
to apply this approach to additional biological networks, 
to study the systems-level function of each CGU, and to 
study which networks evolve to have a topology that can 
be coarse-grained. 
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APPENDIX A: DETECTION OF NETWORK 
MOTIFS USING RANDOMIZED NETWORKS 
THAT PRESERVE CLUSTERING SEQUENCES 

The set of candidate CGUs should ideally be the com- 
plete set of subgraphs of different sizes found in the net- 
work. The complete set of subgraphs is, however, too 
large for the optimization procedure to effectively work 
in practice (there are 199 4-node connected directed sub- 
graph types, 9,364 5-node subgraph types, 1,530,843 6- 
node subgraph types etc., a significant fraction of which 
actually occur in the real networks). Due to computa- 
tional limitations, we considered in the present study 



only a small subset of the subgraphs, those which occur 
significantly more often in the network than in suitably 
randomized networks. These subgraphs are termed net- 
work motifs m m m m • 

For the detection of network motifs we considered two 
randomized ensembles :(1) random networks in which 
each node preserves the number of incoming, outgoing 
and mutual edges (edges that run in both direction) con- 
nected to it in the real network. (2) Random networks in 
which each node preserves the number of incoming, out- 
going and mutual edges connected to it in the real net- 
work, and in addition each node preserves the cluste ring 
coefficient of that node in the real network [lj-|^|.[Tl^. 
The detection of network motifs, u sing ensemble (1) as 

a null hypotheses was described in |l{j, mum. The 

random networks created this way often have a differ- 
ent clustering coefficient for each node than in the real 
network. As a result, the number of nondirected trian- 
gles in the real network is generally different from the 
randomized network ensemble (either higher, as in the 
transistor network, or lower, as in the protein signaling 
network). To control for this, in the more stringent en- 
semble (2), wepreserve also the clustering coefficient of 
each node 0, 0, 0, [HI ("clustering sequence") , using a 
simulated annealing algorithm. 

To create such an ensemble of randomized networks we 
first randomize the real network with a Markov-chain 
Monte-Carlo algorithm, which successively selects two 
node pairs and performs a " switch" , rewiring their edges, 
as described in [2(J, |24j. To define the clustering se- 
quence of a network of N nodes: {Ci}f =1 , we treat its 
nondirected version [rj| : 



2n,, 



Ki(Ki - 1) 



(Al) 



Ki is the number of edges connected to node i (which 
represent either incoming, outgoing or mutual edges in the 
directed version) . rii is the number of triangles connected 
to node i. Denoting the clustering sequence of the ran- 
dom networks by {Cf } i=1 we carry out switches, again 
randomly selecting pairs of edges and rewiring them, but 
this time with probability : 



,-A£ 



(A2) 



where T is an effective temperature, lowered by a factor 
of 5% between sweeps, and E, the energy function, is the 
distance between the clustering sequences of the real and 
random networks : 



E= " \c i -cl\ 
a + CP 



(A3) 



The random networks obtained have precisely the same 
clustering sequence and degree sequences as the real net- 
work. They are thus more constrained than in ensemble 
(1). In the presently studied networks, they contain al- 
most precisely the same number of nondirected triangles 
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as the real network. However, the numbers of directed 
triangle subtypes differ from the real network. There are 
7 types of directed 3-node triangle subgraphs pflj . The 
relative abundance of these 7 subgraphs in the random 
ensemble is determined by different moments of the de- 
gree sequences [2^. Thus, 3-node directed subgraphs can 
still be found as motifs using ensemble (2), depending on 
the network degree sequences. For the transistor network 
and signalling network studied, the two sets of network- 
motifs of 3-6 nodes detected using ensembles (1) and (2) 
had an overlap of more than 90%. Using ensemble (2) on 
the transistor network results in somewhat fewer motifs 
that are triangles with dangling edges, and more tree- like 
motifs than ensemble (1). Using ensemble (2) on the pro- 
tein signaling network results in somewhat fewer tree- like 
motifs. For both networks, the coarse-graining algorithm 
detected the same optimal sets of CGUs using either en- 
sembles. Thus, in the present examples, coarse-graining 
is not affected by choice of random network ensemble. 



APPENDIX C: ALGORITHM FOR DETECTING 
PGNMS 

Topological generalizations are subgraphs obtained 
from smaller network motifs, by replicating one or more 
of their roles, together with their connections |22j. The 
role-replication operation does not increase the number 
of roles in the resulting generalized subgraph, which 
maintains a perfect fit to the block model of the network 
motif. Additionally each node has the same role in both 
the generalized subgraph and in every occurrence of 
the basic motif included in it. The role assignment is 
thus automatically defined. Probabilistically generalized 
network motifs (PGNM's) are subgraphs which have a 
small distance d (Eq. 5) from its block model. 
To detect PGNMs we start with a network motif /z. 
The nodes of each occurrence of [i are partitioned into 
roles 22]. We then form a nondirected graph in 
which each node, is an occurrence of fi in the original 



APPENDIX B: OVERLAP RULES 



The desired CGU set should have minimal overlap 
(shared nodes) between occurrences of the CGUs. In 
cases where shared nodes are necessary, the CGU parti- 
tioning should be such that the shared nodes do not affect 
the function of each CGU. The solutions that maximize 
Eq. 2 or Eq. 8 often have significant overlap between the 
CGUs. Here we describe rules that disqualify solutions 
in which overlap would interfere with the coarse grained 
representation. We also describe how an acceptable CGU 
partitioning is performed in cases where overlap is al- 
lowed. 

Once a set of CGUs which maximizes the scoring function 
is found, it is tested for the following criterion: Allowed 
solutions are those in which each overlapping node re- 
ceives inputs from only one CGU (Fig 11). CGU sets 
which don't meet this criterion are disqualified, and a 
new set is sought. (Note that the overlapping nodes are 
allowed to send outputs to both CGUs). In acceptable 
CGU sets, in every case of an overlap, the overlapping 
node is duplicated, and appears once in each of the CGU 
occurrences. The acceptance criterion above ensures that 
the inputs to the duplicated nodes can be fully captured 
by one of the CGUs, thus ensuring that the function of 
the coarse grained network can be inferred from the func- 
tions of individual CGUs. Finally, in cases where the 
overlapping node only sends output to the CGUs and 
does not receive inputs from them, an additional node is 
created in the coarse-grained network. This node has all 
of the connections of the original node, and sends out- 
puts to the duplicated node in each CGU (e.g. MKP2 
and MKP5 in Fig9,10 and figllc) 






FIG. 11: Overlap rules of CGU candidates. In these examples 
the CGU candidates are a. A 3-node feed-forward loop (left) 
and a 4-node diamond subgraph (right), b. Overlap of nodes 
which receive inputs from only one of the CGUs(left), and 
coarse-grained representation (right), c. Overlap of nodes 
which send outputs to two CGUs (left), and coarse-grained 
representation (right). Note the addition of a node upstream 
of the two CGUs, marked with an open circle, d. Two ex- 
amples of disqualified cases, were a node receives inputs from 
both CGUs : two CGUs with a common edge (left) , and with- 
out a common edge (right). 
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network R, and a nondirected edge between two nodes 
and is set if a) any of the nodes of these occurrences 
in the original network R are connected by an edge, 
or b) any of the nodes in the original network overlap. 
After establishing i? M we start from each node and 
perform a search, consecutively adding the one node in 
i? M which provides the best fit to the block model of fi 
(the resulting joined subgraph with the smallest increase 
in d). We stop when d is greater than a threshold (we 
use 0.3). When calculating the fit to the block- model, 
we partition the nodes of the joined subgraph according 
to their role assignment in /i. If a node in R has different 
roles in two different occurrences of /i, when calculating 
d for the joined subgraph, we take the smallest distance 
obtained from all possible labellings of this node (for 
example, nodes 3, 4 in subgraph G2 of Fig 6 share role 1 



in the bifan (3, 4, 5, 6) and role 2 in the bifan (1, 2, 3, 4)). 
We iterate this procedure by beginning with each r M , 
establishing a list of embedded subgraphs (if two embed- 
ded structures have the same d we keep only the larger 
one). These subgraphs are probabilistic generalizations 
of /i, tagged by their distance from a perfect general- 
ization, d. In finding the optimal coarse-graining we 
perform a simulated annealing algorithm, sequentially 
generating a new active set of CGUs, recalculating 
the scoring function (Eq. 8) and accepting the new 
active set with a Metropolis Monte-Carlo probability. 
During the optimization, we also test the resulting score 
from coarse-graining only subsets of the PGNMs of 
each CGU. For an alternative definition of probabilistic 
network motifs see [33 
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